AITopics | evaluation reward

Collaborating Authors

evaluation reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Real-World Acrobatic Flight from Human Preferences

Merk, Colin, Geles, Ismail, Xing, Jiaxu, Romero, Angel, Ramponi, Giorgia, Scaramuzza, Davide

arXiv.org Artificial IntelligenceAug-27-2025

Preference-based reinforcement learning (PbRL) enables agents to learn control policies without requiring manually designed reward functions, making it well-suited for tasks where objectives are difficult to formalize or inherently subjective. Acrobatic flight poses a particularly challenging problem due to its complex dynamics, rapid movements, and the importance of precise execution. In this work, we explore the use of PbRL for agile drone control, focusing on the execution of dynamic maneuvers such as powerloops. Building on Preference-based Proximal Policy Optimization (Preference PPO), we propose Reward Ensemble under Confidence (REC), an extension to the reward learning objective that improves preference modeling and learning stability. Our method achieves 88.4% of the shaped reward performance, compared to 55.2% with standard Preference PPO. We train policies in simulation and successfully transfer them to real-world drones, demonstrating multiple acrobatic maneuvers where human preferences emphasize stylistic qualities of motion. Furthermore, we demonstrate the applicability of our probabilistic reward model in a representative MuJoCo environment for continuous control. Finally, we highlight the limitations of manually designed rewards, observing only 60.7% agreement with human preferences. These results underscore the effectiveness of PbRL in capturing complex, human-centered objectives across both physical and simulated domains.

large language model, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2508.18817

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

Task Scheduling & Forgetting in Multi-Task Reinforcement Learning

Speckmann, Marc, Eimer, Theresa

arXiv.org Artificial IntelligenceMar-3-2025

Reinforcement learning (RL) agents can forget tasks they have previously been trained on. There is a rich body of work on such forgetting effects in humans. Therefore we look for commonalities in the forgetting behavior of humans and RL agents across tasks and test the viability of forgetting prevention measures from learning theory in RL. W e find that in many cases, RL agents exhibit forgetting curves similar to those of humans. Methods like Leitner or SuperMemo have been shown to be effective at counteracting human forgetting, but we demonstrate they do not transfer as well to RL. W e identify a likely cause: asymmetrical learning and retention patterns between tasks that cannot be captured by retention-based or performance-based curriculum strategies.

curriculum, evaluation reward, scheduling, (13 more...)

arXiv.org Artificial Intelligence

2503.01941

Country: Asia > Middle East > Jordan (0.05)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Structural Design Through Reinforcement Learning

Rochefort-Beaudoin, Thomas, Vadean, Aurelian, Aage, Niels, Achiche, Sofiane

arXiv.org Artificial IntelligenceJul-12-2024

This paper introduces the Structural Optimization gym (SOgym), a novel open-source Reinforcement Learning (RL) environment designed to advance machine learning in Topology Optimization (TO). SOgym enables RL agents to generate physically viable and structurally robust designs by integrating the physics of TO into the reward function. To enhance scalability, SOgym leverages feature-mapping methods as a mesh-independent interface between the environment and the agent, allowing efficient interaction with the design variables regardless of mesh resolution. Baseline results use a model-free Proximal Policy Optimization agent and a model-based DreamerV3 agent. Three observation space configurations were tested. The TopOpt game-inspired configuration, an interactive educational tool that improves students' intuition in designing structures to minimize compliance under volume constraints, performed best in terms of performance and sample efficiency. The 100M parameter version of DreamerV3 produced structures within 54% of the baseline compliance achieved by traditional optimization methods and a 0% disconnection rate, an improvement over supervised learning approaches that often struggle with disconnected load paths. When comparing the learning rates of the agents to those of engineering students from the TopOpt game experiment, the DreamerV3-100M model shows a learning rate approximately four orders of magnitude lower, an impressive feat for a policy trained from scratch through trial and error. These results suggest RL's potential to solve continuous TO problems and its capacity to explore and learn from diverse design solutions. SOgym provides a platform for developing RL agents for complex structural design challenges and is publicly available to support further research in the field.

agent, evaluation reward, rl agent, (13 more...)

arXiv.org Artificial Intelligence

2407.07288

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Denmark > Capital Region > Kongens Lyngby (0.04)

Genre: Research Report (1.00)

Industry:

Construction & Engineering (0.70)
Leisure & Entertainment (0.68)
Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Environment Robustness of Deep Reinforcement Learning Approaches for Autonomous Racing Using Bayesian Optimization-based Curriculum Learning

Banerjee, Rohan, Ray, Prishita, Campbell, Mark

arXiv.org Artificial IntelligenceDec-16-2023

Deep reinforcement learning (RL) approaches have been broadly applied to a large number of robotics tasks, such as robot manipulation and autonomous driving. However, an open problem in deep RL is learning policies that are robust to variations in the environment, which is an important condition for such systems to be deployed into real-world, unstructured settings. Curriculum learning is one approach that has been applied to improve generalization performance in both supervised and reinforcement learning domains, but selecting the appropriate curriculum to achieve robustness can be a user-intensive process. In our work, we show that performing probabilistic inference of the underlying curriculum-reward function using Bayesian Optimization can be a promising technique for finding a robust curriculum. We demonstrate that a curriculum found with Bayesian optimization can outperform a vanilla deep RL agent and a hand-engineered curriculum in the domain of autonomous racing with obstacle avoidance. Our code is available at https://github.com/PRISHIta123/Curriculum_RL_for_Driving.

algorithm, bayesian optimization, curriculum, (13 more...)

arXiv.org Artificial Intelligence

2312.10557

Genre:

Instructional Material > Course Syllabus & Notes (0.47)
Research Report > Promising Solution (0.34)

Industry:

Transportation (0.49)
Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

Ott, Julius, Servadei, Lorenzo, Arjona-Medina, Jose, Rinaldi, Enrico, Mauro, Gianfranco, Lopera, Daniela Sánchez, Stephan, Michael, Stadelmayer, Thomas, Santra, Avik, Wille, Robert

arXiv.org Artificial IntelligenceApr-17-2023

Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26% on average.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2210.13545

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Continual Reinforcement Learning with Multi-Timescale Replay

Kaplanis, Christos, Clopath, Claudia, Shanahan, Murray

arXiv.org Artificial IntelligenceApr-16-2020

In this paper, we propose a multi-timescale replay (MTR) buffer for improving continual learning in RL agents faced with environments that are changing continuously over time at timescales that are unknown to the agent. The basic MTR buffer comprises a cascade of sub-buffers that accumulate experiences at different timescales, enabling the agent to improve the tradeoff between adaptation to new data and retention of old knowledge. We also combine the MTR framework with invariant risk minimization [Arjovsky et al., 2019] with the idea of encouraging the agent to learn a policy that is robust across the various environments it encounters over time. The MTR methods are evaluated in three different continual learning settings on two continuous control tasks and, in many cases, show improvement over the baselines.

agent, buffer, timescale, (15 more...)

arXiv.org Artificial Intelligence

2004.0753

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Investigating Simple Object Representations in Model-Free Deep Reinforcement Learning

Davidson, Guy, Lake, Brenden M.

arXiv.org Artificial IntelligenceFeb-16-2020

We explore the benefits of augmenting state-of-the-art model-free deep reinforcement algorithms with simple object representations. Following the Frostbite challenge posited by Lake et al. (2017), we identify object representations as a critical cognitive capacity lacking from current reinforcement learning agents. We discover that providing the Rainbow model (Hessel et al.,2018) with simple, feature-engineered object representations substantially boosts its performance on the Frostbite game from Atari 2600. We then analyze the relative contributions of the representations of different types of objects, identify environment states where these representations are most impactful, and examine how these representations aid in generalizing to novel situations.

learning, reinforcement learning, representation, (13 more...)

arXiv.org Artificial Intelligence

2002.06703

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.84)

Industry:

Leisure & Entertainment > Games (0.95)
Health & Medicine > Therapeutic Area (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games

Shen, Macheng, How, Jonathan P.

arXiv.org Artificial IntelligenceSep-18-2019

This paper presents an algorithmic framework for learning robust policies in asymmetric imperfect-information games, where the joint reward could depend on the uncertain opponent type (a private information known only to the opponent itself and its ally). In order to maximize the reward, the protagonist agent has to infer the opponent type through agent modeling. We use multiagent reinforcement learning (MARL) to learn opponent models through self-play, which captures the full strategy interaction and reasoning between agents. However, agent policies learned from self-play can suffer from mutual overfitting. Ensemble training methods can be used to improve the robustness of agent policy against different opponents, but it also significantly increases the computational overhead. In order to achieve a good trade-off between the robustness of the learned policy and the computation complexity, we propose to train a separate opponent policy against the protagonist agent for evaluation purposes. The reward achieved by this opponent is a noisy measure of the robustness of the protagonist agent policy due to the intrinsic stochastic nature of a reinforcement learner. To handle this stochasticity, we apply a stochastic optimization scheme to dynamically update the opponent ensemble to optimize an objective function that strikes a balance between robustness and computation complexity. We empirically show that, under the same limited computational budget, the proposed method results in more robust policy learning than standard ensemble training.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1909.08735

Country: North America > United States > Massachusetts (0.28)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback